Overview

Dataset statistics

Number of variables15
Number of observations32561
Missing cells0
Missing cells (%)0.0%
Duplicate rows24
Duplicate rows (%)0.1%
Total size in memory3.7 MiB
Average record size in memory120.0 B

Variable types

NUM13
BOOL2

Reproduction

Analysis started2021-07-04 18:22:32.966098
Analysis finished2021-07-04 18:23:08.990971
Duration36.02 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 24 (0.1%) duplicate rows Duplicates
workclass has 1836 (5.6%) zeros Zeros
education has 933 (2.9%) zeros Zeros
marital.status has 4443 (13.6%) zeros Zeros
occupation has 1843 (5.7%) zeros Zeros
relationship has 13193 (40.5%) zeros Zeros
capital.gain has 29849 (91.7%) zeros Zeros
capital.loss has 31042 (95.3%) zeros Zeros
native.country has 583 (1.8%) zeros Zeros

Variables

age
Real number (ℝ≥0)

Distinct count73
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.58164675532078
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-07-04T23:53:09.108627image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.64043255
Coefficient of variation (CV)0.3535471837
Kurtosis-0.1661274596
Mean38.58164676
Median Absolute Deviation (MAD)10
Skewness0.5587433694
Sum1256257
Variance186.0614002
2021-07-04T23:53:09.286763image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
368982.8%
 
318882.7%
 
348862.7%
 
238772.7%
 
358762.7%
 
338752.7%
 
288672.7%
 
308612.6%
 
378582.6%
 
258412.6%
 
Other values (63)2383473.2%
 
ValueCountFrequency (%) 
173951.2%
 
185501.7%
 
197122.2%
 
207532.3%
 
217202.2%
 
ValueCountFrequency (%) 
90430.1%
 
883< 0.1%
 
871< 0.1%
 
861< 0.1%
 
853< 0.1%
 

workclass
Real number (ℝ≥0)

ZEROS

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8688922330395257
Minimum0
Maximum8
Zeros1836
Zeros (%)5.6%
Memory size254.4 KiB
2021-07-04T23:53:09.463824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median4
Q34
95-th percentile6
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.455959761
Coefficient of variation (CV)0.3763247134
Kurtosis1.682386955
Mean3.868892233
Median Absolute Deviation (MAD)0
Skewness-0.752024012
Sum125975
Variance2.119818825
2021-07-04T23:53:09.651130image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
42269669.7%
 
625417.8%
 
220936.4%
 
018365.6%
 
712984.0%
 
511163.4%
 
19602.9%
 
814< 0.1%
 
37< 0.1%
 
ValueCountFrequency (%) 
018365.6%
 
19602.9%
 
220936.4%
 
37< 0.1%
 
42269669.7%
 
ValueCountFrequency (%) 
814< 0.1%
 
712984.0%
 
625417.8%
 
511163.4%
 
42269669.7%
 

fnlwgt
Real number (ℝ≥0)

Distinct count21648
Unique (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189778.36651208502
Minimum12285
Maximum1484705
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-07-04T23:53:09.865458image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39460
Q1117827
median178356
Q3237051
95-th percentile379682
Maximum1484705
Range1472420
Interquartile range (IQR)119224

Descriptive statistics

Standard deviation105549.9777
Coefficient of variation (CV)0.5561749721
Kurtosis6.218810978
Mean189778.3665
Median Absolute Deviation (MAD)59894
Skewness1.446980095
Sum6179373392
Variance1.114079779e+10
2021-07-04T23:53:10.195760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12301113< 0.1%
 
20348813< 0.1%
 
16419013< 0.1%
 
12667512< 0.1%
 
12112412< 0.1%
 
11336412< 0.1%
 
14899512< 0.1%
 
12398311< 0.1%
 
19029011< 0.1%
 
24199811< 0.1%
 
Other values (21638)3244199.6%
 
ValueCountFrequency (%) 
122851< 0.1%
 
137691< 0.1%
 
148781< 0.1%
 
188271< 0.1%
 
192141< 0.1%
 
ValueCountFrequency (%) 
14847051< 0.1%
 
14554351< 0.1%
 
13661201< 0.1%
 
12683391< 0.1%
 
12265831< 0.1%
 

education
Real number (ℝ≥0)

ZEROS

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.298209514449802
Minimum0
Maximum15
Zeros933
Zeros (%)2.9%
Memory size254.4 KiB
2021-07-04T23:53:10.334748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q19
median11
Q312
95-th percentile15
Maximum15
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.870263951
Coefficient of variation (CV)0.3758191116
Kurtosis0.6806551901
Mean10.29820951
Median Absolute Deviation (MAD)2
Skewness-0.9340424374
Sum335320
Variance14.97894305
2021-07-04T23:53:10.506348image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
111050132.3%
 
15729122.4%
 
9535516.4%
 
1217235.3%
 
813824.2%
 
111753.6%
 
710673.3%
 
09332.9%
 
56462.0%
 
145761.8%
 
Other values (6)19125.9%
 
ValueCountFrequency (%) 
09332.9%
 
111753.6%
 
24331.3%
 
31680.5%
 
43331.0%
 
ValueCountFrequency (%) 
15729122.4%
 
145761.8%
 
13510.2%
 
1217235.3%
 
111050132.3%
 

education.num
Real number (ℝ≥0)

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.0806793403151
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-07-04T23:53:10.702549image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.572720332
Coefficient of variation (CV)0.2552129916
Kurtosis0.6234440748
Mean10.08067934
Median Absolute Deviation (MAD)1
Skewness-0.3116758679
Sum328237
Variance6.618889907
2021-07-04T23:53:10.844653image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
91050132.3%
 
10729122.4%
 
13535516.4%
 
1417235.3%
 
1113824.2%
 
711753.6%
 
1210673.3%
 
69332.9%
 
46462.0%
 
155761.8%
 
Other values (6)19125.9%
 
ValueCountFrequency (%) 
1510.2%
 
21680.5%
 
33331.0%
 
46462.0%
 
55141.6%
 
ValueCountFrequency (%) 
164131.3%
 
155761.8%
 
1417235.3%
 
13535516.4%
 
1210673.3%
 

marital.status
Real number (ℝ≥0)

ZEROS

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.6118362458155464
Minimum0
Maximum6
Zeros4443
Zeros (%)13.6%
Memory size254.4 KiB
2021-07-04T23:53:10.972015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.506221723
Coefficient of variation (CV)0.5766907192
Kurtosis-0.5360804148
Mean2.611836246
Median Absolute Deviation (MAD)2
Skewness-0.01350813803
Sum85044
Variance2.268703879
2021-07-04T23:53:11.056943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21497646.0%
 
41068332.8%
 
0444313.6%
 
510253.1%
 
69933.0%
 
34181.3%
 
1230.1%
 
ValueCountFrequency (%) 
0444313.6%
 
1230.1%
 
21497646.0%
 
34181.3%
 
41068332.8%
 
ValueCountFrequency (%) 
69933.0%
 
510253.1%
 
41068332.8%
 
34181.3%
 
21497646.0%
 

occupation
Real number (ℝ≥0)

ZEROS

Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.572740394951015
Minimum0
Maximum14
Zeros1843
Zeros (%)5.7%
Memory size254.4 KiB
2021-07-04T23:53:11.175772image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q310
95-th percentile13
Maximum14
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.228856803
Coefficient of variation (CV)0.6433932499
Kurtosis-1.234720733
Mean6.572740395
Median Absolute Deviation (MAD)4
Skewness0.1145833164
Sum214015
Variance17.88322986
2021-07-04T23:53:11.292490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10414012.7%
 
3409912.6%
 
4406612.5%
 
1377011.6%
 
12365011.2%
 
8329510.1%
 
720026.1%
 
018435.7%
 
1415974.9%
 
613704.2%
 
Other values (5)27298.4%
 
ValueCountFrequency (%) 
018435.7%
 
1377011.6%
 
29< 0.1%
 
3409912.6%
 
4406612.5%
 
ValueCountFrequency (%) 
1415974.9%
 
139282.9%
 
12365011.2%
 
116492.0%
 
10414012.7%
 

relationship
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4463622124627622
Minimum0
Maximum5
Zeros13193
Zeros (%)40.5%
Memory size254.4 KiB
2021-07-04T23:53:11.406265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.60677095
Coefficient of variation (CV)1.110904956
Kurtosis-0.7683583398
Mean1.446362212
Median Absolute Deviation (MAD)1
Skewness0.7868177781
Sum47095
Variance2.581712887
2021-07-04T23:53:11.523786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01319340.5%
 
1830525.5%
 
3506815.6%
 
4344610.6%
 
515684.8%
 
29813.0%
 
ValueCountFrequency (%) 
01319340.5%
 
1830525.5%
 
29813.0%
 
3506815.6%
 
4344610.6%
 
ValueCountFrequency (%) 
515684.8%
 
4344610.6%
 
3506815.6%
 
29813.0%
 
1830525.5%
 

race
Real number (ℝ≥0)

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6658579281963086
Minimum0
Maximum4
Zeros311
Zeros (%)1.0%
Memory size254.4 KiB
2021-07-04T23:53:11.631892image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q14
median4
Q34
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8488056043
Coefficient of variation (CV)0.2315435079
Kurtosis4.876310395
Mean3.665857928
Median Absolute Deviation (MAD)0
Skewness-2.435386267
Sum119364
Variance0.7204709539
2021-07-04T23:53:11.748388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
42781685.4%
 
231249.6%
 
110393.2%
 
03111.0%
 
32710.8%
 
ValueCountFrequency (%) 
03111.0%
 
110393.2%
 
231249.6%
 
32710.8%
 
42781685.4%
 
ValueCountFrequency (%) 
42781685.4%
 
32710.8%
 
231249.6%
 
110393.2%
 
03111.0%
 

sex
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.4 KiB
1
21790
0
10771
ValueCountFrequency (%) 
12179066.9%
 
01077133.1%
 

capital.gain
Real number (ℝ≥0)

ZEROS

Distinct count119
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1077.6488437087312
Minimum0
Maximum99999
Zeros29849
Zeros (%)91.7%
Memory size254.4 KiB
2021-07-04T23:53:11.857642image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7385.292085
Coefficient of variation (CV)6.853152702
Kurtosis154.7994379
Mean1077.648844
Median Absolute Deviation (MAD)0
Skewness11.95384769
Sum35089324
Variance54542539.18
2021-07-04T23:53:11.944670image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02984991.7%
 
150243471.1%
 
76882840.9%
 
72982460.8%
 
999991590.5%
 
5178970.3%
 
3103970.3%
 
4386700.2%
 
5013690.2%
 
8614550.2%
 
Other values (109)12884.0%
 
ValueCountFrequency (%) 
02984991.7%
 
1146< 0.1%
 
4012< 0.1%
 
594340.1%
 
9148< 0.1%
 
ValueCountFrequency (%) 
999991590.5%
 
413102< 0.1%
 
340955< 0.1%
 
27828340.1%
 
2523611< 0.1%
 

capital.loss
Real number (ℝ≥0)

ZEROS

Distinct count92
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.303829734959
Minimum0
Maximum4356
Zeros31042
Zeros (%)95.3%
Memory size254.4 KiB
2021-07-04T23:53:12.039061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation402.9602186
Coefficient of variation (CV)4.615607584
Kurtosis20.37680171
Mean87.30382973
Median Absolute Deviation (MAD)0
Skewness4.594629122
Sum2842700
Variance162376.9378
2021-07-04T23:53:12.125065image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03104295.3%
 
19022020.6%
 
19771680.5%
 
18871590.5%
 
1848510.2%
 
1485510.2%
 
2415490.2%
 
1602470.1%
 
1740420.1%
 
1590400.1%
 
Other values (82)7102.2%
 
ValueCountFrequency (%) 
03104295.3%
 
1551< 0.1%
 
2134< 0.1%
 
3233< 0.1%
 
4193< 0.1%
 
ValueCountFrequency (%) 
43563< 0.1%
 
39002< 0.1%
 
37702< 0.1%
 
36832< 0.1%
 
30042< 0.1%
 

hours.per.week
Real number (ℝ≥0)

Distinct count94
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.437455852092995
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-07-04T23:53:12.247702image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.34742868
Coefficient of variation (CV)0.3053463286
Kurtosis2.916686796
Mean40.43745585
Median Absolute Deviation (MAD)3
Skewness0.2276425368
Sum1316684
Variance152.4589951
2021-07-04T23:53:12.347824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
401521746.7%
 
5028198.7%
 
4518245.6%
 
6014754.5%
 
3512974.0%
 
2012243.8%
 
3011493.5%
 
556942.1%
 
256742.1%
 
485171.6%
 
Other values (84)567117.4%
 
ValueCountFrequency (%) 
1200.1%
 
2320.1%
 
3390.1%
 
4540.2%
 
5600.2%
 
ValueCountFrequency (%) 
99850.3%
 
9811< 0.1%
 
972< 0.1%
 
965< 0.1%
 
952< 0.1%
 

native.country
Real number (ℝ≥0)

ZEROS

Distinct count42
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.718866128190164
Minimum0
Maximum41
Zeros583
Zeros (%)1.8%
Memory size254.4 KiB
2021-07-04T23:53:12.491661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q139
median39
Q339
95-th percentile39
Maximum41
Range41
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7.823781904
Coefficient of variation (CV)0.2130725354
Kurtosis12.53305268
Mean36.71886613
Median Absolute Deviation (MAD)0
Skewness-3.658303295
Sum1195603
Variance61.21156328
2021-07-04T23:53:12.669114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
392917089.6%
 
266432.0%
 
05831.8%
 
301980.6%
 
111370.4%
 
21210.4%
 
331140.4%
 
81060.3%
 
191000.3%
 
5950.3%
 
Other values (32)12944.0%
 
ValueCountFrequency (%) 
05831.8%
 
1190.1%
 
21210.4%
 
3750.2%
 
4590.2%
 
ValueCountFrequency (%) 
4116< 0.1%
 
40670.2%
 
392917089.6%
 
38190.1%
 
37180.1%
 

income
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.4 KiB
0
24720
1
7841
ValueCountFrequency (%) 
02472075.9%
 
1784124.1%
 

Interactions

2021-07-04T23:52:37.683096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:37.920210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.073661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.256265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.378333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.520003image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.675219image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.801776image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:38.934979image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.051720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.182967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.327707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.468004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.595306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.737432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:39.874391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.021301image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.255560image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.405826image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.537055image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.669849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.788851image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:40.934192image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.062839image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.227796image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.352199image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.504396image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.651034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.827735image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:41.956118image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.089421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.222438image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.370213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.508533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.635194image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.776087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:42.891569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.026569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.155442image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.291817image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.414480image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.562219image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.688565image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:43.939673image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.082389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.206357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.362933image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.486160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.634584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.769489image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:44.884650image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.000400image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.132799image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.265315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.395205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.545811image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.694682image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.820980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:45.959842image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.098868image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.247442image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.387819image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.522302image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.653282image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.785863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:46.915517image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.034794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.168288image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.322652image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.448461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.743634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:47.901860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.040402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.160820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.291507image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.408098image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.537093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.670410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.802375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:48.918103image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.087250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.228790image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.380214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.513196image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.665023image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.803190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:49.921440image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.062495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.188090image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.306830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.434367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.554835image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.690393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.820432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:50.957016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:51.104285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:51.281891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:51.532765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:51.695497image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:51.962079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:52.359765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:52.713378image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:53.011525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:53.327265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:53.611171image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:53.943575image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:54.407244image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:54.860850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:55.195259image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:55.583388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:55.927884image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:56.239472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:56.491084image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:56.816081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:57.050694image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:57.237961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:57.436201image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:57.635396image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:57.881053image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:58.137847image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:58.408385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:58.665527image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:58.806030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:59.135418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:59.515941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:59.668505image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:52:59.826555image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.023824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.258759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.448882image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.623633image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.771389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:00.899083image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.039742image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.277280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.466998image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.591515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.715095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:01.864056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.083996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.210231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.326115image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.459159image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.601590image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.747863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:02.920995image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:03.080883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:03.235790image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:03.409070image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:03.587486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:03.819528image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:04.103953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:04.487983image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:04.732392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:04.935604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:05.153048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:05.395459image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:05.638119image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:05.859457image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.017786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.188442image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.350105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.488165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.649244image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.791836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:06.933439image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:07.071252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:07.317737image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:07.571912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-07-04T23:53:12.899424image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-04T23:53:13.156484image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-04T23:53:13.525625image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-04T23:53:13.765462image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-07-04T23:53:08.041658image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-04T23:53:08.648924image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Sample

First rows

ageworkclassfnlwgteducationeducation.nummarital.statusoccupationrelationshipracesexcapital.gaincapital.losshours.per.weeknative.countryincome
090077053119601400435640390
1824132870119641400435618390
26601860611510604200435640390
354414035954074400390040390
441426466315105103400390040390
5344216864119084400377045390
638415060106514410377040390
77478863810164102400368320391
86814220131190101400368340390
941470037151043441030046001

Last rows

ageworkclassfnlwgteducationeducation.nummarital.statusoccupationrelationshipracesexcapital.gaincapital.losshours.per.weeknative.countryincome
32551436272421510230410050390
325523243406606260010040390
32553434846618112120410045390
3255432411613812144131110011360
325555343218651214240410040391
3255622431015215104111410040390
325572742573027122135400038390
32558404154374119270410040391
32559584151910119614400040390
32560224201490119413410020390

Duplicate rows

Most frequent

ageworkclassfnlwgteducationeducation.nummarital.statusoccupationrelationshipracesexcapital.gaincapital.losshours.per.weeknative.countryincomecount
8254195994324914000401303
0194972611194514100403902
119413815315104134000103902
219414667915104432100303902
319425157915104834100143902
4204107658151041314000103902
52142433681314514100502602
6214250051151041034000103902
7234240137434614100552602
92543081449134314100402602